-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index version path #1012
Merged
Merged
Index version path #1012
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
amywieliczka
force-pushed
the
index-version-path
branch
from
June 25, 2024 00:56
47f215d
to
36387ea
Compare
amywieliczka
force-pushed
the
index-version-path
branch
from
June 25, 2024 23:54
0f9bb31
to
a1dfecf
Compare
amywieliczka
force-pushed
the
index-version-path
branch
2 times, most recently
from
June 26, 2024 00:26
69400d3
to
dacc492
Compare
amywieliczka
force-pushed
the
index-version-path
branch
from
June 27, 2024 21:21
dacc492
to
d1ab7b4
Compare
bibliotechy
reviewed
Jul 8, 2024
) | ||
verbed = "published" if alias == 'rikolti-prd' else "staged" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You verbed the noun verb!
bibliotechy
previously approved these changes
Jul 8, 2024
@amywieliczka This looks great. Glad we are using pools! |
This was
linked to
issues
Jul 8, 2024
barbarahui
reviewed
Jul 9, 2024
description=( | ||
"Creates an empty index at rikolti-<name>; if no name " | ||
"provided, uses the current timestamp. Adds the index to " | ||
"the rikolti-stg alias." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor thing, but this description isn't quite right.
barbarahui
previously approved these changes
Jul 9, 2024
amywieliczka
dismissed stale reviews from barbarahui and bibliotechy
via
July 11, 2024 20:15
60ab54c
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Schema Updates and Changes to existing indexing to Stage Processes:
update_stage_index
toindex_collection
and addalias
as a parameter toindex_collection
to generalize this function.add_page
toindex_page
, update the bulk opensearch action to beindex
, rather thancreate
, and userefresh: true
as a parameter to the bulk opensearch request. Theindex
action will create opensearch documents if they don't already exist, and will overwrite opensearch documents if they do exist. Therefresh: true
parameter will make re-indexed documents available immediately on all shards (at an index-time cost to the cluster), in order to then run our delete by query request against the most up-to-date version of all records.page
,version_path
, andindexed_at
to records just prior to indexing.indexed_at
is defined as the datetime at the start of indexing.delete_collection_records_from_index
to happen just after the bulk indexing, and updatedelete_collection_records_from_index
to first query for "outdated records" - records of the given collection ID, that don't match the given data version - before then deleting all these outdated records. The query helps us report out the versions of each outdated record.version
andindex
to the SNS event sent to the registry at the end of the Airflowupdate_stage_index_for_collection_task
Migration script:
version_path: initial
,indexed_at: <time migration script started>
, andpage: unknown
to records in the index already via a re-index.Publish Processes:
update_stage_index_for_collection_task
toindex_collection_task
to generalize between -stg and -prd index aliases.stage_collection_task
andpublish_collection_task
, which both callindex_collection_task
with a different alias.publish_collection
DAGPooling:
rikolti_opensearch_pool
. Since we just have one cluster across all stage and prod indices, any and all Airflow tasks hitting OpenSearch should by added to this pool. We can configure the pool using the Airflow UI, and should monitor the OpenSearch cluster's performance using the CloudWatch Dashboard.Developer Candy:
OPENSEARCH_IGNORE_TLS=True
in the environment.rikolti-stg
andrikolti-prd
aliases to a new opensearch cluster (as one would get when running a new docker compose).